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Abstract 

Universally achievable error exponents pertaining to certain families of channels 
(most notably, discrete memory less channels (DMC's)), and various ensembles of ran- 
dom codes, are studied by combining the competitive minimax approach, proposed by 
Feder and Merhav, with Chernoff bound and Gallager's techniques for the analysis of 
error exponents. In particular, we derive a single-letter expression for the largest, uni- 
versally achievable fraction ^ of the optimum error exponent pertaining to the optimum 
ML decoding. Moreover, a simpler single-letter expression for a lower bound to ^ is 
presented. To demonstrate the tightness of this lower bound, we use it to show that 
^ = 1, for the binary symmetric channel (BSC), when the random coding distribution 
is uniform over: (i) all codes (of a given rate), and (ii) all linear codes, in agreement 
with well-known results. We also show that ^ = 1 for the uniform ensemble of system- 
atic linear codes, and for that of time-varying convolutional codes in the bit-error-rate 
sense. For the latter case, we also show how the corresponding universal decoder can be 
efficiently implemented using a slightly modified version of the Viterbi algorithm which 
employs two trellises. 

Index Terms: error exponent, universal decoding, generalized likelihood ratio test, 
channel uncertainty, competitive minimax, Viterbi algorithm, maximum mutual infor- 
mation decoding. 
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1 Introduction 



In many real-life situations, encountered in digital coded communication systems, channel 
variability and uncertainty prohibit the use of the optimum maximum likelihood (ML) 
decoder, and so, universal decoders, independent of the unknown channel parameters, are 
sought. 

The topic of universal coding and decoding for unknown channels has received con- 
siderable attention in the last three decades. In [5], Goppa offered the maximum mutual 
information (MMI) decoder, which decides in favor of the code vector with maximum em- 
pirical mutual information with the channel output. Goppa showed that for DMC's, MMI 
decoding achieves capacity. Csiszar and Korner [2] also explored the universal decoding 
problem for DMC's with finite input and output alphabet. They showed that the random 
coding error exponent associated with a uniform random coding distribution over a type 
class achieves the optimum error exponent. Csiszar [1] proved that for any channel within 
the class of DMC's with additive noise, and the uniform random coding distribution over 
linear codes, the optimum error exponent is achievable by a decoder minimizing the noise 
empirical entropy, universally for all the channels in the class. Ziv [12] explored the univer- 
sal decoding problem for finite state channels with finite input and output alphabets, for 
which the next channel state is a deterministic (but unknown) function of the channel cur- 
rent state and current inputs and outputs. For codes governed by a uniform random coding 
over a given set, he proved that a decoder based on the Lempel-Ziv algorithm asymptoti- 
cally achieves the error exponent associated with ML decoding. In [6], Ziv and Lapidoth 
proved that the latter decoder is universal for a wider class of finite-state channels. In [3], 
Feder and Lapidoth found sufficient conditions for families of channels, to have universal 
decoders that asymptotically achieve the random coding error exponent associated with ML 
decoding. 

Universal coding and decoding were explored also with regard to the generalized likeli- 
hood ratio test (GLRT). In this approach, each message is scored according to the maximum 
likelihood (over the parameter space) of the channel output vector given the message, and 
a decision is made in favor of the message that attains the highest maximum likelihood. 

Although provably optimum in certain asymptotic situations [11], [2, p. 165, Theorem 5.2], 
there are cases where the GLRT is strictly suboptimum [6, Sect. Ill, pp. 1754-1755], [4, 
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Appendix] . 

The competitive minimax criterion, first presented in [4], is an attempt for a general 
methodological approach to the problem of universal decoding. According to this approach, 
the criterion is the minimum (over all decision rules) of the maximum (over all channels in 
the family) of the ratio between the error probability associated with a given channel and 
given decision rule, and the error probability of the ML decoder for that channel, raised 
to some power ( G [0, 1] (cf. eq. (2) below). The largest power ^ = ^* such that the value 
of this minimax ratio does not grow exponentially with the block length, is the maximum 
universally achievable fraction of the ML error exponent. 

The main contribution of this paper is in deriving a single-letter expression to in 
terms of the rate R and a general random coding distribution, for fairly general families 
of channels and ensembles of random codes. While in previous works the universality was 
proved for certain channel models (e.g. finite-state channels, etc.) and random coding 
distributions (e.g. uniform distribution over a given type class, etc.), this work deals with 
general families of DMC's (cf. Sect. II) and general random coding distributions (cf. eq. 
(7)). We should note that a similar technique can be used to broaden the result for ^* to 
other channel families, e.g. Markov channels, finite state channels, etc. 
In addition, a single-letter expression for a lower bound to ^* is presented, which is simpler 
to work with, and is believed to be tight. This lower bound is true also for random coding 
distribution over ensembles of linear code and systematic linear codes. The tightness of this 
lower bound is demonstrated for the case of the BSC. For this model, we show that ^* = 1, 
when the random coding distribution is uniform over all codes and over all linear codes, in 
agreement with well-known results. We also show that ^* = 1 for the ensemble of systematic 
linear codes, and for that of time-varying convolutional codes in the bit-error-rate sense. 
Using the fact that in the case of the BSC, the minimax decoding metric degenerates to a 
simpler metric, we propose an efficient implementation based on a slightly modified version 
of the Vitcrbi algorithm. 

The outline of the paper is as follows. In Section II, we establish the notation that will 
be used throughout the paper and provide a formal definition of the universal decoding 
problem. In Section III, the main results are stated and discussed. Section IV contains 
a detailed proof of the single-letter expression for ^* will be provided. In Section V, the 
tightness of the lower bound to ^* is demonstrated for the case of the BSC with an unknown 
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crossover probability. In Section VI, we prove that for the ensemble of time-varying convo- 
lutional codes and the BSC with an unknown crossover probability, the minimax decoder 
achieves the same bit error exponent as the ML decoder, which is used when the parameter 
is known. 

2 Notation and Problem Definition 

Throughout this paper, scalar random variables (RV's) will be denoted by capital letters, 
their sample values will be denoted by the respective lower case letters, and their alphabets 
will be denoted by the respective calligraphic letters. A similar convention will apply to 
random vectors of dimension N and their sample values, which will be denoted with same 
symbols in the bold face font. The set of all A^-vectors with components taking values in a 
certain alphabet, will be denoted as the same alphabet superscripted by N. 
Information theoretic quantities like entropies, conditional entropies, and mutual informa- 
tions, will be denoted following the usual conventions of the information theory literature, 
e.g., H{X), H{X\Y), I{X;Y), and so on. With a slight abuse of notation, when we wish 
to emphasize the dependence of the entropy on the underlying probability distribution P, 
we denote it by H{P). 

The mutual information between the input and the output of the channel 
{Pq {y\x) ,x € X,y G y}, when the input is governed by Q, will be denoted by 

and the capacity of the channel will be denoted by Co = maxg Ig {Q). 
The number of occurrences of a letter a G in a vector x G will be denoted by Nx (a) . 
The empirical distribution of x will be denoted by Px = {Px{a) = Nx{a)/N, a G X}. The 
type class of x is defined as Tx = \^x' : P^, = P^cj and Hx{X) = - EaeA" Px {a) In Pa; (a) 
will denote the entropy of a random variable (RV) X, with distribution Px- Similarly, the 
number of occurrences of a letter pair {a,h) ^ X xy in the vector pair (a;, y) will be denoted 
by Nxy{a,b), Pxy = {Pxy {a,b) = Nxy{a,b)/N, (a, 6) E X x y} will denote the joint 
empirical distribution of {x, y), Txy = ja;', y' : P^i yi = Pxy^ will stand for the joint type 
class of {x, y), and Hxy(X, Y) = - J2a,beXxy Pxy {a, b) \nPxy (a, b) will denote the joint 
entropy of RV's {X, Y) with joint distribution Pxy- We will use Tx\y = ^x' : P^'y = Pxy^ 
to denote the conditional type class of x given y, Px\y = ^xyio-,b)/Ny{b), (a, 6) G 



X X y, to denote the conditional empirical distribution related to (a, 6) E X x y, and 
Hxy{X\Y) = — J2a,b&Xxy Pxy {a, b) In Px\y {a\b) to denote the conditional entropy of X 
given Y, induced by the joint distribution Pxy- The empirical mutual information between 
RV's X and Y with joint distribution Pxy will be denoted by Ixy{X]Y) = Hx{X) — 
Hxy{X\Y). 

The expectation of a function F{X,Y), where X and Y are RV's distributed according to 
the empirical distribution of x and y, will be denoted by 

Exy{F{X,Y)} =Y,Y.Pxy{a,h)F{a,h). 

The notation Eq {F{X)} will be used for the expectation of a function F{X), where the 
random vector X is governed by Q. 

The Hamming distance between two vectors x and y will be denoted by d(x,y), and its 
normalization by N will be denoted by S{x,y). For a finite set A, \A\ will stand for its 
cardinality. The divergence between two probability measures P and Q over an alphabet U 
will be denoted by D {P\\Q) = J2ueU ^ ^(^' where OlnO and Oln g are defined as 0, 
and i'ln^ for P > is defined as oo. For two positive sequences {^Ar}iv>i and {B]\f}N>i, 
the notation An = Bn will express the fact that {An}n>i and {Bn}n>i are of the same 
exponential order, i.e., 

hm lln(^A,/Sjv) = 0. 

AT— ►oo iV 

Consider a DMC with a finite input alphabet X, a finite output alphabet y, and single 
letter transition probabilities {Pe {y\x) ,x E X,y e y}, where 6 is an unknown parameter 
vector, taking values in some set B. The channel is fed by an input vector of length N, 
X G , and generates an output vector y € y^ according to Pg{y\x) = Jli^i PeiUil^i)- A 
rate-i? block code of length N consists of M = A^-vcctors x^ € X^ , < m < M — 1, 
representing M different messages. A decoder is a partition of into M regions, 
r^O) ^^1) • • • ) ^M-i, such that if y falls into fi^, a decision is made in favor of message m. 

Given a code C, the competitive minimax criterion [4] is defined as 

where Pe = jj ^m=Q Y.yen" Peiyl^m) is the error probability related to a decoder 
for a given value of 6, and P^ (6) = miuQ Pe{^\0) is the ML decoding error probability 
when 9 is known. 
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The ratio Pe {n\9)/[PE* {0)]^ designates the loss in error probabiUty, caused by using a 
universal decoder which is ignorant of 9, relative to the optimal ML decoding for that 9. 
The parameter ^ can be interpreted as the fraction of the optimal error exponent to which 

the universal decoder error exponent is compared. In order to minimize this loss uniformly 
over all G, a decoder Q which minimizes the worst case of that ratio (i.e., its maximum), is 
sought. 

As Sn addresses the ratio between the error probabilities, it corresponds to the differ- 
ence between the error exponents related to these errors. It is well known that for most 
channels, the decoding error decays exponentially with the block length N. Therefore, if 
the value of Sn, for a decision rule achieved by (2), grows sub-exponentially with N, 
i.e., hmAf_>oo InS^ = 0, it means that, uniformly over 9, the error probability associated 
with Q decays with an exponential rate which is at least a fraction ^ of the error exponent 
rate oi Pe* {9). 

In [4], the following decision rule has been shown to be asymptotically optimal in the 
minimax sense for a given ^: 

= {y\f{xm,y) > f{xm',y), ^m' / m] (3) 
with ties broken arbitrarily, where 

f{x,y) = max/e(a;,y), (4) 
fe{x,y) = ^lnPe{y\x)+^E*{9), (5) 

and E*{9) stands for the asymptotic exponent associated with P^ {9). A decoder CI, defined 
by (3), will be called the minimax decoder hereafter. 

A natural question that may arise, at this point, is with regard to the choice of the 
free parameter ^. As mentioned above, the main guideline proposed in [4] is to seek the 
maximum value (,* of ^ such that >S'jv would still grow sub-exponentially with N. 

In the random coding regime, the error probabilities at the numerator and the denomi- 
nator of (2) are replaced by the corresponding average error probabilities, i.e.. 



Sm = minmax l _^ ^ — ^ \ (6) 

and the decoder (3) is used, with E*{9) being replaced by E*(6), the random coding error 
exponent associated with P*^ (9). 



Pe im \ 



The main purpose of this paper is to translate the above-mentioned guidehne for the 
choice of ^ into a concrete single-letter formula for the random coding regime. 



3 Statement of Results 

In this section, by evaluating the exponential order of Sn, we derive a formula for the 
largest value of ^ for which Sn is sub-exponential in N. Moreover, an expression for the 
lower bound to ^* is also derived, and its tightness is demonstrated for the BSC model and 
for several ensembles of random codes. 

3.1 General codes 

We begin with a few definitions. For every positive integer N, let Qat be a random coding 
distribution for A^-vectors, of the following form: 

^ Qn{Tx) 

Qn[x) = , (7) 

\J-x\ 

i.e., uniform distribution for all the vectors within the same type class. Of course, 

Y,Qn{Tx) = 1. 

Tx 



Now, let 



Ar^iPx) = -^InQ^iTx), 



and let A^(P) be an extension of the function An{Px) that is defined over the continuum 
of probability distributions over X (rather than just the set of rational probability distri- 
butions with denominator N). We next define the class Q of sequences of random coding 
distributions {Qn} as follows: A sequence of random coding distributions {Qn}n>i is said 
to belong to the class Q if there exists such an extension A^(P) that converges, as AT — > oo, 
to a certain non-negative functional A*(P), uniformly over all probability distributions {P} 
over X. 

It is easy to see that the class Q essentially covers all random coding distributions that 
are customarily used (and much more). In particular, to approximate a random coding 
distribution which is uniform within a small neighborhood of one type class correspond- 
ing to a probability distribution Pq, and which vanishes elsewhere, we set A*(P) = 
for every P in that neighborhood of Pq, and A*(P) = oo elsewhere. For the case where 
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Q is i.i.d., A*(P) = D{P\\Q). In particular, if Q{x) = for all x G , then 

A*{P) = \n\X\- H{P). 

Given a joint distribution Pxy, a real a, and a value oi 6 E @, let 

Ai9,a,PxY) = I{X;Y) + A*(j2PYib)PxiYi-\b)]-aE\nPg{Y\X), (8) 

\bey J 

where E{-} is the expectation and I{X; Y) is the mutual information w.r.t. a generic joint 
distribution PxY{a,b) = PY{h)Px\Y{a\b) of the RV's {X,Y). 

Next, for distributions PyiPx\y and Px'\Yi two parameters 9, 9' € 0, and reals < p < 1 
and s > 0, define: 

B{9, 9', Py, Px\Y,Px'\Y, s, p) = A{9, 1 - sp, Pxy) + P ■ A{9' , s, Px'y) - H{Y), (9) 

where H{Y) is the entropy of Y induced by Py. Finally, let 

. . / . . B{9,9',Py,Px\y,Px'\y,s,p)-pR 

f (K) = mmmmmax<mm max mm ; , ^ ^ , 

^ ^ Pxre'ee [eee o<p<i p^,^^ {I - ps)E*.{9) + psE;{9') 

0<s<l/p 

B{9, 9', Py, Px\y,Px'\Y, s, p) - pR] 



max max mm 



(?G0 o<p<i P^,^y (1 - ps)E;{9) + psE*{9') I 

s>l/p 

(10) 

Our main result, in this section, is the following: 

Theorem 1 Consider a sequence of ensembles of codes, where each codeword is drawn 
independently, under a distribution Qn, and the sequence {Qn}n>i is a member of the 
class Q. Then, 

1. For every ^ <C limAr-»oo ;^ In Sat < 0. 

2. There exists a sequence of encoders {Cjv}jv>i and minimax decoders {^^Ar}jv>i 
C = C (R)) for which: 



lim inf 

N-*oo 



-^lnPE{nN\9) 



>^-E*{9) 



uniformly over 9 E G. 
3. For every ^> C {R), linijv->oo :^ I^Sn > 0. 
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The proof of Theorem 1 appears in Section IV. 

We now pause to discuss Theorem 1 and some of its aspects. 
The theorem suggests a conceptually simple strategy for universal decoding: Given R and 
the sequence {Qn}n>i, first, compute ^* (i?) using eq. (10). This may require some non- 
trivial optimization procedures, but it has to be done only once. It should be mentioned that 
if closed-form analytic expression does not seem available, the computation can be carried 
out at least numerically, since this is a single-letter expression. Once ^* (R) has been 
computed, apply the minimax decoding rule with ^ = ^* {R) and the theorem guarantees 
that the resulting random coding error exponent associated with the decoder is as specified 
in the second item of that theorem. Moreover, the third item of the theorem implies that in 
the random coding regime, 4* (R) is the largest fraction of E*{0) that is uniformly achievable 
by a universal decoder. 

As mentioned earlier, when Q is uniform i.i.d., A*(P) = InlA"! — H(X) (where X is 
governed by P), and therefore 

A{e,a,PxY) = ln\X\ - H{X\Y) -aElnP0{Y\X). (11) 

This observation will be used in Section V which deals with the BSC model, as well as in 
Section A.l of the Appendix (ensembles of linear and systematic linear codes), as they both 
assume a binary i.i.d. random coding distribution. 

The theorem is interesting, of course, only when ^* (R) > 0, which is the case in many 
situations, at least as long as R is not too large. It should be pointed out that the exponential 
rate ^* {R) ■ E*{9), guaranteed by Theorem 1, is only a lower bound to the real exponential 
rate (as the minimax criterion is aimed to consider all 9 E Q), and that true exponential 
rate, at some points in Q, might be larger. 

As mentioned above, the exact formula for given in eq. (10), includes many opti- 
mizations and hence might be complicated for calculation. Therefore, we next present a 
simpler expression for a lower bound to denoted by ^2_b (-^)' which wc believe is tight 
at least for several families of channels. Another motivation for presenting (R) is that 
it holds also for ensembles of linear and systematic linear codes, as we will shall in the next 
subsection. The expression for (R) will be derived from ^* (R) by: (i) avoiding the inner 
maximization between two terms in (10) by choosing the left term, and (ii) interchanging 
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(12) 



As (R) is a lower bound to ^* , it is obvious to see that parts 1 and 2 of Theorem 1 hold 
for it as well. 

3.2 Lineeir codes 

We next provide a variation of (R) for ensembles of linear codes and systematic linear 
codes. Prior to that, wc first define these ensembles. A linear code is defined by mapping 
each of the M = 2^ binary information (row) vectors Urn.) 

< m < M - 1, of length K, 
into its corresponding code (row) vector Vm, of length N, in the following way: 



where G is a binary generator matrix of dimension K x N and Vq is an additive vector of 
length N. The © operation denotes a summation modulo 2 and the multiplication between 
Um and G is conducted over the field GF{2). A systematic linear code is defined in the 
same manner, with the restriction that the left K x K block of G (the systematic part 
of G) forms the identity matrix (thus, the first K bits of each code vector, Vmy form the 
corresponding information vector, Um)- 

We now consider a random coding distribution, which is i.i.d. over the ensemble of linear 
codes (or systematic linear codes), for which the elements of G (or G, the non-systematic 
part of G, in the case of systematic linear codes) and vq are drawn independently using a 
uniform single-letter distribution Q* = |f ) ^| (fair coin tossing). We also define the family 
of the binary-input, output- symmetric (BIOS) channels, as channels with a binary input 
alphabet X ("0" and "1"), an output alphabet y (possibly infinite), where the transition 
probabilities satisfy P{y\0) = P{—y\l),\/y G 3^, for a well defined operation "— " (note that 
the definition of symmetry can be used as long as each y G y satisfies that — y G ^ as 
well). For example, the BSC, when mapping "0" +1 and "1" —1, is a BIOS channel. 
The additive Gaussian channel with two antipodal input letters, xi and X2, is also a BIOS 
channel. 

The following theorem is stated with regard to codes governed by the above mentioned 
ensembles and transmitted via a BIOS channel: 



V, 



m — 



UmG®vo, m = 0, 1, . . . ,M - 1 
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Theorem 2 Consider the sequence of ensembles of linear or systematic linear codes, where 
the elements of G (orG) andvo are drawn independently by fair coin tossing. Let {P0,O G 
0} be a family of BIOS DMC's. Then, the lower bound (R) of eq. (12), continues to 
hold, with A*{P) = ln2 - H{P). 

Theorem 2 is proved in Section A.l of the Appendix. 

The single-letter expression derivation for (^) is carried out (see Section A.l of 
the Appendix) using the same techniques as in Gallager's classical work, which are tight 
in the random coding sense. We therefore believe that the achievable lower bounds to the 

real exponential rates are tight as well. To demonstrate the tightness of the lower bounds 
suggested in (12) (for general codes) and in Theorem 2 (for linear and systematic linear 
codes), we have the following lemma: 

Lemma 1 Consider the family of BSC's parameterized by the crossover probability 9. Then, 
^LB — 1 ^''^d, hence (R) = 1, in the following cases: 
(i) The ensemble of all codes with Qn{x) = for all x. 

(a) The ensemble of linear codes and systematic linear codes, as in Theorem 2, with 
A*(P) = ln2 -H{P). 

Lemma 1 is proved in Section V. 

It should be mentioned that proving that under the BSC model ^* = 1 is universally 
achievable by random coding over general codes and linear codes is by no means new, as 
it was already proved and discussed in [1]. Nevertheless, it demonstrates the tightness of 
^2b(-R)- However, to the best of our knowledge, the same result regarding ensembles of 
systematic linear codes has not been proved yet and is first shown here. 

3.3 Convolutional codes 

For the special case of the BSC mentioned above, we now introduce the following result, 
related to ensembles of time-varying convolutional codes, when the minimax decoding is 
used. Prior to that, we first define this ensemble and the bit error exponent related to it. 

A convolutional code of rate b/n (5, n - positive integers) and constraint length Kb is 
defined as one for which at each time instant t > 0, the code vector of length n, Vt, is 
obtained by 

min{t,K-l} 

vt= J2 ut-jGj^vo, (13) 

j=0 
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where Ut—j is a binary information row vector of length b at time t — j, Gj,0 < j < K — 1, 
are binary matrices with b rows and n columns each, and vq is a vector of length n. 

Let us now consider a code C, governed by i.i.d. random coding over the ensemble of 
time-varying convolutional codes, whose code vector of time instant t > 0, Vt, is obtained 

by 

min{t,K-l} 

vt= ut-jG*j®vl, (14) 

,7=0 

where at each time instant t, the elements of G*-, < j < K — 1 and Vq are drawn indepen- 
dently using the uniform single-letter distribution 1 5 , 1 1 • 

The average bit error probability, Pft(ilx), associated with a sequence of decoders Qk = 
{^k,n}'^^i of block length N and constraint length K, and averaged over the ensemble of 
time-varying convolutional codes, is defined as the expected relative frequency of bit errors 
in the decoded information stream, i.e. 



PbinK) = limsup Pb{nK,N). (15) 

N-*oo 

The bit error exponent associated with a sequence of decoders O = {^k}k=i defined as 



Eb{n) = -\imsui) — lnPb{nK). (16) 

Theorem 3 Consider the sequence of ensembles of time-varying convolutional codes of 
rate b/n and constraint length Kb (with K ^ 00), described as in the previous paragraph, 
and assume a family of BSC 's parameterized by the crossover probability 9. 
The achievable bit error exponent (as defined in (16)) using the minimax decoder is equal 
to the one when 6 is known and the ML decoder is used. 

The proof of this theorem is based on the following observation: 

Under the BSC model with an unknown crossover probability 9, the minimax decision rule 
(as defined in (3)) is equivalent to a decision rule, denoted by A, and defined as: 

Am = {y\p{xm,y) < p{xm',y), Vm' ^ m] , (17) 

with ties broken arbitrarily, where 

p(£c, y) = min {6{x, y),l - 5{x,y)} . (18) 

As mentioned in Section II, 5{x, y) denotes the normalized Hamming distance between x 
and y. This equivalence is proved in Section A. 7 of the Appendix. We should note that 
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for this case, the minimax decoder coincides with the MMI decoder as well. Based on 
this equivalence, the full proof of Theorem 3 is given in Section VII. We also introduce an 
efficient implementation of minimax decoding, based on a slightly modified version of the 
Viterbi algorithm. This is done by applying the Viterbi algorithm twice: first for minimum 
Hamming distance, and then for maximum Hamming distance. This process results in two 
survivors and the selection between them is done in favor of the one whose normalized 
Hamming metric is more distant from ^ (the one with the minimal p). 

4 Proof of Theorem 1 

We first observe that for a DMC, {Pg {y\x) ,x £ X,y ^ y}, and for each vector pair (x, y), 
the minimax metric for a given 9, fg{x, y), depends on x and y only via their joint empirical 
distribution: 

feix,y)=ExylnPe{Y\X) + ^E;{9). (19) 

We, therefore, conclude that the value of 9 maximizing fg{x,y) also depends on x 
and y only via their joint empirical distribution. Let &n denote the subset of Q with 
values of 6 that achieve maxg f0{x,y) = f{x,y) as {x,y) exhaust x . In the 
decoding process, maximization over 9 can be achieved only by points in Ojy. Since the 
number of joint empirical distributions of {x,y) is upper bounded by (A^ + l)!^"'*"', then 
\@n\ < (A^ + 1)1^11'*"' as well. 

As a first step, we assume given channel input and output vectors, x and y, respectively. 
Considering a random coding distribution, Qn, we exponentially evaluate the probability 
of having another codeword x' that is preferred by the minimax decoder over x. This 
probability will be denoted by a{x, y). 



a{x,y) = Qr,{fiX',y)>f{x,y)} 

= QN{ma^fe>{X',y)>f{x,y)\ 



(a) 



= max QN{fd'iX',y) > f{x,y)} 

= max Q,v |5]lnP,,(y,|X;) > -mE:{9') + N ■ f{x,y)] 
expL[f]lnP,,(yil^O + mKiO')]] 



= max min£^o 



exp{-sNf{x,y)} 
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= max mm En^e'^ f''^^' ■e-'^f^'''y\ (20) 



where (a) is true since 



Qjv{/0/(X',y) > f{x,y)} < Qat i fe'{X',y) > f{x,y) 

= Qn\ \J fe'{X',y)>f{x,y) 
< E QN{fe'{X',y)>f{x,y)} 
= \eN\- uis^QN{fe'{X',y)>f{x,y)}, (21) 

and in (6) wc used the Cheroff bound, which is tight in the exponential sense. 

By using the method of types, it is proved in Section A. 3 of the Appendix that for any 
real a, 

Eqn [e^'^M^'y^] = /k^* W-'^i^^'aJiy M»'<^'Pxy)] ^ ^22) 

where the function A{6,a,Pxy) is defined as in (8). 

Using this observation, we can continue to evaluate a{x,y) as follows: 



a{x,y) = mi^ mmcxp[N[s^E;{0') - mm A{e' ,s,P^,y)]} ■ exp{-sNf{x,y)} 
^"^ x'\y 

= max minexp|-Ar[-s^£;*(6l') + min A(9',s,P^,„) + sf(x,y)]] 

x'\y 

^ m|^mmexp{-iV[G'(^',s,e,Pa;y)]}. (23) 



Therefore, the probability that the decoder will prefer any of the other M — 1 codevectors 
rather than the transmitted codevector x can be evaluated as follows: 

l-(l-a(a;,y))^-i = min{l, • a(a;, y)} 

= min|l,^m|x minexp {-N[G{9' , s,^, Pxy) - 

= max minexp-^ —N ■ max p\G(d', s, ^, Pxv) — R]\ 

= m^ min expj-AT • [pG(^', s, Pxy) - pR]], (24) 

where the equivalence in (a) (see [9], Section V, and [8], Section A. 2 p. 109-110) implies 
that the union bound in the random coding error exponent is tight. 



14 



Now, we will evaluate Sn, the average of the minimax criterion over the ensemble of 
codebooks governed by a random coding distribution, for the minimax decoder defined in 



(3): 



Sn = max 



Pe {n\6) 



= max e^«^*W ^ Qm{x) ^ P,(y|a=) [l - (1 - a(a;, y))^-^' 
= mjg:J Qn{x) e^^''^*'^) m|>^ mmexp{-iV[pG(e',s,4,Pa;y)-pi?]} 



(a) 



max< 

6>eG ^„ „ 



y&y 



Qn{t, 



X 



^y\x 



s>0 

:,Nfe(X,y) 



max mm 
''e0w o<p<i 

s>0 



-Npminp , A(e',s,P , ) ' 
^NpsiE*{0') . g x'\y Xy' ^-Npsf{X,y) . ^NpR 



= ma^ e-^^^(^^) • e^^^^^^l^^ 



max e 



Nfe(X,y) 



max mm max 

6i'eejvo<p<iP / 



max e 



Nfg„{x,y) 



-ps 



,NpR I 



(6) 



= max max min max <! e-^^*(^a;)e^^*?/(^l^)e^^^«^'^(^')e 
Pxy e'eON o<p^i p_/ 



s>o 35 \y 



max e 

6»Gejv 



NfgiX,y) 



1—ps 



,NpR I 



max max mm 
Pxy O'eON o<p 



in max J exp{ N[-A*{Px) + Hxy{Y\X) + ps^E;{0') 



-pAi9',s,P^,y)+pR] 



max e 



Nfe{x,y) 



l—ps ^ 



= max max min max i exp (at • r(6l', Py, Pa;|y, -Pa-'i^,, P, 

P'T-ii O'eON o<p<i P I \ tf \ti \y ) 

xy So x'\y (. 



max e 



Nfe{x,y) 



l—ps ' 



(25) 



where in (a) we switched to a summation over the joint empirical types of x and y (which 
is legitimate since both fg{x,y) and G{6' , s,^, Pxy) depend on x and y via their joint 
empirical distribution), and in (b), we used the convergence assumption of the random 
coding distributions within the class Q to claim that A*j^{Px) — > A* (Pa;) as iV ^ oo 
independently of Px, and also united the optimizations over 9 and 9". 
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We should observe that: 

-| l—ps " 



max <^ cyip^N ■T{e',Py,Px\y,P^'^y,s,p,^,R)^ 



mm 

0<p< _ , 



max e^'M^'V^ 



= mini min max max expj [r(6'', Py,Px\y,Px'\y^ R) + fe{x, y)(l - ps) | 
o<s<i//9 X \y ^ 

min max min exp<^ iV T{9' , Py, Px\y, P^r^, s, p,^,, R) + fd{x,y){l - ps) \. 

s>i/p •'^ ly ^ 

= mini max min max expi T(0' , Py, Pxiy, P^/^y, s, p,^, R) + fe{x,y){l - ps) \ 

eiel, o™j<i expjiV r(6'', Py,Px\y,Px'\y'^^ ^) + fei^^ - P«) | i 



5>i7p a; ly 



(26) 

max min max erxn 



= min <^ max min ma^ exp \n ■ f {0,6' , Py, Px\y, P^/,^, s, p,^, R)] , 
I eeOjv o<p<i P / I if \if u, \y ) 

o<s<i/p X \y 



min min max exp [n ■ f{9, 9', Py,Px\y, Px'\y^^^ P^ ^' ^)} [' (27) 

where in (a), two interchanges are made: one between the minimization over p and s 
and the maximization over 9 in the left term of the outer minimization, and one between 
the maximization over P™/,,, and the minimization over 9 in the right term of the outer 
minimization. The first interchange is justified in the Appendix, Section A. 2. The second 
interchange is possible since the term to be optimized is a product of two exponential 
terms, one depends on P^r., and one depends on 9, therefore the optimizations can be 

\y 

done independently. 

Consequently, we conclude that: 



Sn = max max min < max min max exp < iV • T(^, Pi/, Pa-iT., P /. , s, p, f , i?) f , 
PxyO'^^N feejv o<p<i p , l » \y x \y j 

o<s<i/p X \y 

min min max exp {at ■ f{9, 9', Py,Px\y, Px'\y^ P^ ^^P)}\ 



= max max mm<! 



i< exp< A^ ■ max min max T(9,9',P'u,P'r\ii,P^f,„.,s,p,£,R)>, 
Pxyd'&^N 1 I eee^ o<p<i p , v > > y> x\y. x \y^ '^''^ 

exp<^Ar- min min max f'(6', 6'', Py , Pa;|i/, P^/..^., s, p, ^, P) U 
s>i/p ly ^ 
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exp< N ■ max max min<^ max min max T(6, 9', Py, Pxw, P^'i^., s, p, f , R), 
1 Pxye'e0N Uee^ o<p<i p , v , > y> x\y, x \y^ ^ 



mjn max f{e, 9', Py,Px\y, ^x'ly'^' P' ^' 



mm 

OgBn 

s>i/p ^ \y 



(28) 



Now, 

f{e,9',Py,Px\y,Pxf^y,S,p,^,R) 



= -A*{Px) + Hy{Y) - Ixy{X;Y) + ps^E:i9') - pA{9',s,P^,y) 
+(1 - ps)ExylnPe{Y\X) + (1 - ps)^E;{9) + pR 

= -Ai9, 1 - ps, Pxy) - pAi9', s, P^,y) + Hy{Y) + psiE;{9') 
+{1 - Ps)^e;{9) + pR 

= -B{9, 9', Py,Px\y, Px'iy, p) + psiE;{9') + (1 - ps)iE;{9) + pR, 



(29) 



where the function B{9, 6'', Py,Px\y, Px'\y^ ^' defined as in (9). 

Therefore, in order for Sn to grow sub-exponentially with TV", we seek the maximal ^ 
such that: 

max max min < max min max T(9,9',Py,P'r\ii,P^f,„.-,s,p,C,R)-, 
PxyO'^QN leee^ o<p<i p , v > > y> x\y, x \y' ^ 

o<s<i/io ^ \y 

min min max fi9,9\Py,Pxiy,P^f,^,s,p,C,R)] < (30) 

As the empirical distributions become dense in continuum of probability distributions as 
N oo, and since the function T{6, 9' , Py,Px\y, ^x'ly^ ^' '^-^ ^® continuous in Py, Px\y 
and P^',„,, it is equivalent to perform the above optimizations over continuous distributions 
rather than empirical distributions. The same token can be used in order to broaden the 
maximization space for 9 and 9' from 6jv to 6. Thus, the condition becomes: 

max max min < max min max.T(9,9' Py, Pvu,, Px'hn s, p,£, R), 
Pxy o'ee [eee o<p<i p^,. ^ ' ' ^ij" ^ \y' ^' 

0<s<l/p 

min min maxf{9,9',Py,Px\y,Px'\y,s,p,^,R)\ <0 (31) 

p€W 0<p<1 Px'lv ) 

s>l/p 
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In other words, a maximal ^ is sought such that: 



or 



max min mcixf{e,e',Py,Px\y,Px'\y,s,p,^,R)<0 (32) 
0<s<l/p ^ 



nun min max f{e, 0', Py, Px\y, Px'\y, s, p, (,R) <0 (33) 

s>yp '''''' 



An equivalent condition is: 

ypxy^e'ee 



B{9, 9', Py, Px\y, Px'\y^ s, p) - pR 



or 



f < mm max mm ; r — —- ' -, — r (34) 

^ - e&e ()<p<i /V|^ {1- ps)E*{9) + psE*{e') ^ ' 

0<s<l/p 



. B{9,9',Py,Pxiy,Px'\y,s,p)-pR 

f < max max mm -, — —- ■ ; — r (35) 

^ - eee o<p<i p^,,^, (1 - ps)E*{9) + psE*{9') ^ ' 

s>l/p 



Therefore, 



. . r . . B{9,9',Py,Px\y,Px'\y,s,p)-pR 

t (K) = mmmmmax<mm max mm , ^ ^ , 

^ ^ PxyO'ee Uee o<p<i p^,,^ {1 - ps)E*{9) + psE*{9') 

0<s<l/p 

. B{9,9',Py,Pxiy,Px'\y,s,p)-pR ^ 

max max mm — , ^ ' ■ — ^ >. (3o) 

eee o<p<i p^,|^ {1 - ps)E*{9) + psE*{9') / ^ ^ 

s>l/p 

5 Example - the BSC 

In this section, we demonstrate that for the special case of BSC with an unknown crossover 
probability, and a uniform random coding distribution, ^^^(-R) = 1 and hence ^*{R) = 1, 
in agreement with well known results [1]. 

Consider the lower bound (12) and choose the uniform single-letter random coding 
distribution Q* = {|, |}. 
Now, the value of A{9,a, Pxy) is (see (11)): 

A{9,a,PxY) = ln2 - H{X\Y) -aE\nP0{Y\X) (37) 
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Therefore, 



mmA{e,a,PxY) = ln2 - max{H{X\Y) + aEliiP0{Y\X)} (38) 



Px\Y Px\Y 



In addition, for the case of BSC with an unknown crossover probabihty, 9, we have (see [7], 
Section VI): 

max{H{X\Y) +aE In P0{Y\X)} = ln[(l - e)° + r] 



Px\Y 



= V{e,a) (39) 



Prom these two observations, we conclude that: 

mm A(e, a, Pxy) = In2-V(6',a) (40) 

Px\Y 

Using (9), we get: 

.p. . . . . A{e, 1 - Xp, Pxy) + pAjO', A, Px'y) - H{Y) - pR 

fTT,(R) = mm mm max mm mm — ^ -, ^ — — — 

^^^^ ' Py Ofi'ee o<p<i Px\yPx'\y {1 - Xp) ■ E*{9) + Xp ■ E*{e') 

0<X<l/p 

(l + p)ln2-V{e,l-Xp)-pV{e',X)-H{Y)-pR 
= mm mm max ; ; — r — z— -— ; ^ , 

Py e,e'€e o<p<i {1 - Xp) ■ E*.{e) + Xp ■ E*{6') 

0<X<l/p 

. il + p)ln2-Vi9,l-Xp)-pV{9',X)-HiY)-pR 

> mm max mm ^ — — 

- e,0'ee o<p<i Py {I - Xp) ■ E*.{e) + Xp ■ E*{9') 

0<\<l/p 

pln2-V{9,l-Xp)-pV{9',X)-pR 

= mm max ; — — . (41) 

e,0'ee o<p<i {1 - Xp) ■ E*{9) + Xp ■ E*{9') ^ ' 

0<X<l/p 

Now, the random coding error exponent associated with ML decoding, E*{9), to which 
the minimax decoding error exponent is compared, is achieved for the BSC model by the 
following optimization (see [10, Sect. 3.1, 3.2 and 3.4]): 



E;i9) = m^ max|-ln ^ [ ^ Q(x) • P4y|x)^] - pi? j 
^ ^ ye{o,i} xe{o,i} ^ 

= max^[p\n2 - {1 + p)ln[{l - 9)^p +9^p] - pR^ 



= max Er(9,p), (42) 
0<P<1 ^ ^ ' 

where in (a), the inner maximization is achieved by taking Q* = ^} ([10, Sect. 3.4]). 
Let us now define p' = and p" = j — 1, and rewrite the numerator of (41) as follows: 
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pln2 - V{e, 1 - Ap) - pV {e', \)-pR = 



piu2-v[e.^)-pv{e'.^)-pR 

(1 - \p) [p' In 2 - (1 + p')V (e, - p'r 

Xp [p" In 2 - (1 + /) V (9', - p"R 



+ 



ii-Xp)-Er{e,p') + Xp-Er{e',p") 

(1 - Xp) ■ Er (e, +Xp-Er (^e', j-lj. (43) 



Finally, we get that 



^ . ii-Xp)-Er{0,j^j + Xp-EAe',\-i) 

^,AR) > mm m^^ (1 - Xp) ■ Em + ' " ^''^ 

0<A<l/p 

Now, by choosing A = j^, where p is the achiever of E*{0') = maxQ^ p<i Er{0' , p), and 
p = 1^(1 + p), where p is the achiever of E*{9) = maxo<p<i -E'r(^, p) (observing that 
j^(l + p) < 1, therefore this choice is feasible), we get that both the numerator and the 
denominator of (44) equal to (1 — Ap) • Er{6, p) + Xp ■ Er{9', p), and so, (R) = 1. 

We should note that for the BSC model, the same conclusion (i.e., ^* = 1) holds also for 
linear codes and systematic linear codes (as the optimal random coding distribution that 
was used is Q* = ^} (see (42)). 

6 Proof of Theorem 3 

First, consider a given channel output related to the entire transmitted sequence of infor- 
mation. Without loss of generality, the all-zero message will be assumed to be transmitted. 
Let us now consider a segment of length K + 1, I > 0, of the transmitted information vector, 
and any other incorrect path diverging from it at node j and emerging at node j + K + I 
(note that the minimum length of a diverging path is K since after a non-zero vector is 
inserted to the encoder, K — 1 zero vectors are needed in order to return to the all-zero 
state) . 

We observe that the information sequence related to such an incorrect path has the 
following structure (we ignore the values of the information sequence outside the range 
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(jJ + K + l-l)): 

Uj , Uj+i Uj+i , 0, ■ ^. ,0 

K-l 

where all of the vectors are of length b. 

In order for the incorrect path to diverge exactly from node j to node j + K + I, Uj and 
Uj-^i can be any of the 2^ — 1 non-zero vectors (thus, there are {2^ — 1)^ possibilities for 
their values), and each of the I — 1 information vectors mj+i, • • • , Uj^i—i can be any binary 
vector of length b, with the restriction of no more than K ~ 2 consecutive all-zero vectors 
(thus, there are less than 2^('~^) possibilities for their values). Therefore, the number of 
such incorrect paths, denoted by M, is upper-bounded by 

M < (2^ - 1)^ 2''('-i) < (2^ - 1) 2^' (45) 

We next upper bound the probability that an incorrect path is preferred by the minimax 
decoder (minimizing the metric p) over the correct path, and then average this probability 
over the ensemble of time-varying convolutional codes. 

We will use Vj = [vj,Vjj^\^ . . . , Vjj^K+i-i] to denote the code vector of length TV = n{K+l) 
that corresponds to the correct all-zeros path, while V! and Vj' will be used to denote code 
vectors that correspond to other incorrect paths. The notation V j will be used for the 
complement vector of Vj. A segment of length N of the corresponding channel output will 
be denoted by Yj, and Q* will be used to denote the random coding distribution. 



= E Q*iVj,Vj)'P^{piVj^Yj)<piVj,Yj)\e} 

2-2^ ^ Fr{p{Vf,Yj)<p{Vj,Yj)} 
= 2-2^ Yl PT{mm{d{Vf,Yj)A-d{Vf,Yj)}<mm{S{Vj,Yj)A-S{Vj,Yj)^^ 

v^yj 

= 2-2^ PT{[S{Vf,Yj)<ram{S{Vj,Yj),l-5{Vj,Yj)} 

vjyj 

U [S(v'j, Yj) < min {S{Vj,Yj), 1 - S{Vj,Yj)}] } 
< 2-2^ Y PT{6{Vf,Yj)<d{Vj,Yj)[j6(v'j,Yj)<6iVj,Yj)} 

Vjyi 
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Vi Vi 



V-i Vi 



Vi Vi' 



2 • 2-'^ E E pH'^^^/' ^ ^^^^^ M 



VjVf 



(/) 



^ E 



^/ 



EyPe{y\v) 



y L V 
-NRe,o{Q*) 



N 



(46) 



where 



-1-E 



In (a) we used the fact that both Vj and can attain each of their 2^ possible values 
equiprobably and independently. This claim for Vj (which corresponds to the all-zero path) 
can be justified due to the fact that the elements oi G^,0 < j < K — 1 and are repeatedly 
randomized at each time instant (see (14)). Therefore, yO<i<K + l, Vj+i = Vq^^, thus 
each one of these vectors is likely to attain each of its 2" values equiprobably. This claim for 
Vj (which correspond to the incorrect path) can be justified since uj and uj^i arc non-zero 
and Wj+i, • • • , Uj^i—i cannot include more than K — 2 consecutive all-zero vectors. Thus, 



at 



each code vector of Vj_^j^, < i < K + I is formed by the modulo-2 sum of Vq with 
least one of the rows of Gq^^, G]^^, . . . , G-j^ii^ and is therefore likely to attain each of its 
2" values with equal probability as well and independently with the other code vectors (this 
fact is dealt in details in [10, Sect. 5.1]). (6) is true since we switched into looser conditions 
inside each event in the probability term. In (c) we used the union bound. In (d) we used 
the fact that observing 6{Vj,Yj), when summing up over all of Vj's possible values, is 
equivalent to observing S{Vj, Yj) (since in both cases, each of the 2^ values of the vector is 
covered by the summation). In (e) we used the Bhattacharyya bound for the pairwise error 
probability when using ML decision rule, and (/) is true since the channel is memoryless. 

We proved that the probability that other code segment would be preferred by the 
minimax decoder over the correct segment, when averaged over the ensemble of time- varying 
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convolutional codes, is upper bounded by twice the bound achieved for ML decoder in [10]. 
Thus, it is exponentially of the same order. The subsequent steps in deriving an upper 
bound to the bit error exponent for rates R < i?e,o [Q*] are identical to that of ML decoder 
(see [10, Sect. 5.1]) and the final result is the same. 

Therefore, it was proved that when using the minimax decoder, the achievable exponent 
for bit error probability is no less than when the channel parameter is known and the ML 
decoder is used. The same error exponent was proved to be achievable for rates up to 
Refi (Q*). 

In order to extend the average upper bound for the bit error probability to rates higher 
than Rg o (Q*), we will use a slightly different technique. 

First, we upper bound 7Ti^g{j), the probability that a branch in the minimax based decoding 
path will occur by any one of the other possible paths, starting at node j and reemerging 
after K + / branches. We should observe, as mentioned in (45), that the number of such 
diverging paths satisfies M < ^2^* — 1^ 2*^ The code segments associated with these M 
incorrect paths will be denoted by vj^\ ... , Vj^\ respectively. 



= Pv{3 1<i<M:p{vl'^,Yj)<piVj,Yj)} 

= Pr{3 1 < z < M : min {<5(v/^\ Y^), 1 - ^iVj^^Yj)} < min {S{Vj,Yj), 1 - S{Vj,Yj)}} 
= Pr{3 l<i<M: 6{vI'\Yj) < min{5iVj,Yj),l - <5(V,-, 1^-)} U 
5{V^\Yj) < mm{6{Vj, Yj), 1 - S{Vj,Yj)}} 

< Pr{3 l<i<M: S{V^ ,Yj) < 5{Vj,Yj)[jS{vf ,Yj) < 6{Vj,Yj)} 

< Pr|3 1 < i < M : 6{Vy' ,Yj) < S{Vj, Yj)} + 
Pr{3 l<i<M: S{vf\Yj) < S{Vj,Yj)} 

(c) 1 ^ 1^1 

Yj -1 Yj -1 

where (a) is true since we increased the right terms of the two inequalities, and thus increased 
the probability for union of these two events, in (6), the union bound was used, and in (c), 
we used the Gallager bound for the error probability when using the ML decision rule. This 
error was used for each of the two error probabilities. 
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We now move to 
colvolutional codes: 



upper bound the average of T^ifi{3) over the ensemble of time- varying 



(a) 



(b) 



id) 



(e) 



(9) 



^2-^ ... 2-^^-^,.(i) 

Vj y{M) 
3 3 



ATM 



r M ... 

[E^^(^ii^^'^)~]'+ [E^^(^il^f )^ 
1=1 



^(1) ^(M) 
3 3 

M 



i=l 



2.EE2-"^^<^(>^ii^.-)- E ••• E 2-^^[E/'.(is- 



^(1) ^{M) 
3 3 

M 



W 2.EE2-^P.(l^i|V,-)^[E E ••• E 2-^^P,(Y,|V^(^))T^]^0<p<l 



i 3 

M 



2-EE2-"'^'.(^il^i)^[E E 2-^p,(r,|i^(^^)T^ 

^1 2 . (2^ - 1) 2'"''EE2-^^'^(^il^i)^ [ E 2-^P,(lS-|T^^'^)^ 



< p < 1 



< p < 1 



2 • (2" - 1) 2^'^E[E2-^^'e(^il^i)^ 



where 



2.(2'-l)2'^''\Y^^2-^Pe{y\v)^^ 

\ V V ) 

i^h _ ^ ^Up^-{KM)nEe,^{p\\\'\) ^ < p < 1, 

^e,o(p,Q*) = -lnE[E2-^^(^ 
y V 



, < p < 1 

< p < 1 



N 



(48) 



In (a), we sum over all possible code vectors associated with the different paths in the 
trellis. As explained earlier, each code vector can attain all of its 2^ values equiprobably 
and independently with the other code vectors. In (6), we used the result from (47). (c) is 



F(i) 



true since examining Pe{Yj\VY'),l < i < M, when summing up over all of vj^\ ■ ■ ■ 



'31' 3 



possible values, is equivalent to the examination of (1^- 1 TA^*^ ) , 1 < i < M . In (d), we 
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bound ourselves to < p < 1 and use Jensen's inequality, (e) is true since for a fixed 
i, Po{Yj\Vj^^) depends only on vj^\ and is enumerated for the 2^^^~^^ possibilities of 
V}^\ Vl^~^\ . . . , vj^K In (/), we upper bound M by (2^- - l) 2^^ and (g) is 

true since the BSC is memoryless. 

As in the above proof for rates up to Rg Q (Q*) , the subsequent steps in deriving an 
upper bound to the bit error exponent for rates Rq^ {Q*) < R < Cg for the minimax 
decoder are identical to that of ML decoder (see [10, Sect. 5.1]) and the final result is the 
same. This completes the proof that the achievable exponent for bit error probability of 
the minimax decoder is equal to that of the ML decoder, for all rates up to capacity. 

A. Appendix 

A.l Proof of eq. (12) for ensembles of Lineeir and Systematic Lineeir Codes 

In this section, we examine the performance of the minimax decoding rule with respect to 
uniform i.i.d. random coding over ensembles of linear codes and systematic linear codes. 
We will prove that for a family of BIOS channels, the same single-letter formula for the 
lower bound to the achievable fraction ^* is obtained, with uniform i.i.d. random coding 
distribution Q* = 5} (i.e. A*(P) = ln2 - H{P)). 

Using Gallager's techniques, we first upper bound the decoding error probability given 
that the m-th message was sent for a given 9 in the following way: 

PEm (^1^) = Pe{y\vm)l\^rn' m : max fe'{vm',y) > max fe"{vm,y)} 



Y] Pe{y\vm)l\^0',^rn' ^ m : fe'{vm',y) > fe"{vm,y)} 



Y] Pe{y\vm) max l\3m' 7^ m : fe'{vm',y) > m^ f0"{vm,y)} 

y(,yN ^ e eSN 

f0'{Vm',y) 



Y: Pe{y\vm) max l{3m' + m : ^'^'^^"^''f . > l} 

y^N ^eGjv maxe//ge^/e/,(v^,y) J 



< Y Pe{y\Vm) max min 



^Nfg,iVm',y) y 

^ma^e"Ge^e^fe"i'"m,y)) 

where p > and A > are free parameters. 



(49) 



(a) is true since if we denote with A{p) an event dependent on G 9iv, and denote with C 
a constant, then 

I {30 G Gat : A{e) > C] = max 1 {A{e) > C} . 
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(6) is true since if we denote with /i(m') and f2{'m) two non-negative functions of m' and 
m respectively, then (using Gallager's technique) 



< dm 7= m : — r > 1 > < mm 



/2M 



A>0 



E 



<=/2(m) 



Based on (49) , we now develop an upper bound to the minimax criterion related to a specific 
linear code (i.e., specific values of G and Vq, thus denoted by Sn {vq, G)): 



[Vn, G) = max< ^ > = max-^ — .,^„, > 



< 



M-l 



< max< — e^^^'''^^^ Pf){'u\Vm) max min 



m=o yey^ 



p>0 



E 

^ M-l 



max< — Q^-fe{Vjn,y) m^x min 



E 



,Nfgr{Vm',y) 



max£)//ee^ e^fe"i'*^7n,y) 



(a) 1 



M-l 



< - E E 



, max e 

m=o yey^ 



Nfe{ 



Vm,y)\ ma,: 
/ 6»'eG 



(&) 1 



M-l 



> > max mm-! 
m=0 yg3;iv ^ I 



max e 



max mm 

p>0 



Nh{Vm,y) 



J2 e^^-fe'('"rn',y) 



(maxg/'eejv e^fe"iPm,y)'^ 



l-Ap 



^ eN\U>{Vm',y) 



(c) 1 
< 



M-l 



> > max mm max < e 



iV(i-Ap)/e(W^,y) 



0<A<l/p 



id) 1 



M-l 



> > max max min < e 

m=0 yeyN " " 0<A<l/p 



Nil-Xp)fe(,Vm,y) 



E - 



,NXfg,iVm',y) 



(50) 



The passages (a)-(d) are explained as follows: In (a) we used the fact that the maximum of 
an expectation is no greater than the expectation of the maximum and changed the maxi- 
mization of 6 to be over Ojv. (b) is true since 9 and 9" maximize two identical expressions, 
and therefore can be united. In (c) we restricted the range of the optimization to 1 — Ap > 
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X <l/p). In (d) we used the fact that for given Vm, V and 6' 

.IP 



N{l-Xp)feiVm,y) 



mm max < e 
0<A<l/p 



max min \ e^^'-^P^M''rn,y) 
0<A<l/p 



f,NXUi{Vm',y) 



(51) 



This interchange between the minimization over A and p and the maximization over 9 is 
justified in the Appendix, Section A. 2. 

Prior to deriving the single-letter formula for the lower bound to we first present 
the following claim: 

Lemma 2 When a linear code is used for a BIOS channel and minimax decoding is used, 
the error probability for the m-th message is equal for all < m < M — 1. 

This lemma is proved in Section A.4 of the Appendix. 

Based on this observation, we can assume, without loss of generality, that Uq = was 
transmitted, and then the upper bound to Sn can be expressed as: 

f M-l . 

Sn(vo,G) < y max max min \e''(^->^P^M''o,y)\y e^^fe'(Vm,y)Y \ 

yer^ o<x<i/p "»=i 

(52) 

In the following subsections, we will use the same technique to derive two upper bounds 
on the minimax criterion, one for the ensemble of linear codes and one for the ensemble of 
systematic linear codes. 

Linear Codes 

By averaging Sn over the ensemble of linear codes: 

Sn 2-(^+^)^ Y: Sn{vo,G) 
vo,G 

( [M-l 

< 2-(^+^)^ y y max max min e^^i-^'')^''^^"'?/) V 
vo,Gy&YN o<x<i/p'- 



m=l 



(53) 



^-(K+i)N y y y y mm u 

vo,G yeY^ eeQN e'ee^ q<x<i/p ^ 



N{l-Xp)fe{Vo,y) 



m=l 



NXU'{Vm,y) 



P 
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< 



id) 

< 



< 



(/) 



< 



min 

p>0 

yeYN de0N e'e0N o<\<i/p 



E E E 

yeYN ee0N O'eOi, 

L-iK+l)N I g7V(l- 



M-1 1 P\ 

.m=l 



|0)vP max max min 

y(^y^ 0<\<i/p 

1 ^ ^N(i-\p)fe{Vo,y)2-KN Y 
^ Vo G 



M-l IP^ 
.m=l 



^ |0ArP max max min 

^.^eeSN 9'€Bn o<p<i 



y&y 0<A<l/p 



2-"E 



,N{l-Xp)fe{Vo,y) 



M-l 



G m=l 



IGjvP y^ max max min 

'^eejv e'Gejv o<p<i 



2~'^Ee 

Vo 



N{i-xp)fe{vo,y) 



M-l 

-KN ^ ^ ^NXfg,(,Vm,y) 



EE- 

m=l G 



iGjvP max max min 

^^eejv^'eejv o<p<i 



2-^ Y e^(i-V)/e(Vo,2/) 



Vo 



{M-l)2-''J2e^^fo'iv,y) 



V 



IGjvP y^ max max min 

eeejv^'Gejv o<p<i 
yey^ o<A<i/p 



MP 



Y e^(^-^P')Mv,y) 



V 



(54) 



(55) 



where the steps {a)-{f) are as foUows: The equahty in (a) is obtained by averaging over 
2{K+i)N equiprobable values of Vq and G. (b) and (d) follow from the fact that for a 
non-negative function f{9), non-negative function f{9), 



inax fie) < ^ f{9) < \Qn\ ■ max /(^). 



(56) 



(c) is true since an expectation of a minimum is upper-bounded by the minimum of the 
expectation. In (e), we limit the optimization over p to < p < 1 and use Jensen's 
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inequality. In (/), we used the following equivalence for the two inner summations: 



M-l 



m=l Q V 

This equivalence is proved in Section A. 5 of the Appendix. 

From (19), we conclude that the term inside the summation in (55) is identical for all 
y's of the same type class. Thus, the summation can be conducted over types. Using (22), 
we continue to upper bound Sat in the following way (note that the function A{9,a,Pxy) 
used here corresponds to a binary i.i.d. random coding distribution, as specified in (11)): 

- , ,2 V- 

Sn < ©Af / max max min 
j-^ 9€0Ne'eeN o<p<i 
o<A<i/p 

f 7VpRgiVHy(y)g^ {l-\p)iE;{e)-minp^^y A{e,l-Xp,Pxy) 



N 

e 



Xp^E*(e')-p-mmp Aie',X,P , ) 

X \y X y 



IBjvPy^max max min 
e^eGOjve'eejv o<p<i 
Ty o<x<i/p 



I e^p{N[pR + Hy{Y) + (1 - \p)iE;{e) - min A{e, 1 - Ap, Pxy) 
I Px\y 

x'ly ) 



\y 

< |0ivP (AT + l)!-^! max max max min 
Py e&0NO'eeN o<p<i 

0<A<l/p 



exp|Ar[pi? + Hy(Y) + (1 - \p)^E;{e) - min A{e, 1 - \p, Pxy) 
^ ^x\y 

+Xp^E;{e') - p ■ min A{e', A, P^ry)]}] 
x'\y 

= |07vP (A" + 1)'"^' • exp< A^ • max max max min max max 

i Py eeONe'eON o<p<i Px\yP^r,, 

o<A<i/p ^ X \y 

[pR + Hy{Y) + (1 - Xp)^E;{e) - A{9, 1 - Xp, Pxy) 

+XpCE;{e') - p ■ A{9', A, P^fy)]^ (58) 

where in (a) we upper bound \Ty\ by e^'^y^^\ and in {h) we upper bound the summation 
of the functional over Ty by the product of the maximal value (achieved by a specific 
distribution Py) with {N + l)'-^', which is an upper bound to the number of type classes 
[Ty]. 
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As explained earlier, we seek the maximal ^ such that Sn grows sub-exponentially with 
N. To this end, we can ignore the factor \@Nf {N + 1)1-^1 in (58), as it grows polynomially 
with N. Moreover, as mentioned in Section V, the optimizations can be conducted over 
continuous distributions and over the entire parameter space, 0. Thus, a maximal ^ is 
sought, such that (using (9)): 



max max mm max max 
Py 0,e'ee o<p<i Px\y Px'\y 

0<A<l//9 



pR+ii-Xp)(E:ie)+\pCE;{9')-B{e, 9', Py, Px\y, Px'\y, a, p) 

(59) 



< 0. 



An equivalent condition to (59) is 

VPy, ye,9' eG, 30<p< 1,0< A< 1/p: VPx|y, ^ Px'\y 

pR + {l- Xp)^E;{9) + \piE;{9') - B{9, 9',Py,PxiY,Px'\Y, A, p) < 



or, 

VPy, ^9,9' e e, 30 < /9< 1,0 < A < 1/p: ^ Px\y, ^ Px'\y 

^ B{9, 9\ Py, Px\Y, Px'\Y, A, P) - pR 
^- {1-Xp)-E*{9) + Xp-E*{9') ■ 

Consequently, for ensembles of linear codes and BIOS channels, the lower bound to is 
the same as in (12), with a uniform i.i.d. random coding distribution, Q* = |i, i|. 

Systematic Linear Codes 

A similar technique will be used now to achieve identical results for the ensemble of sys- 
tematic linear codes. 

By averaging Sn over this ensemble: 



Sn 



G ^0 

< 2-^(^-^)2-^ yy y max max min 
Q vo yeYN o<x<i/p 



,N{i-Xp)fg{vo,y) 



M-l IP^ 
.m=l 



< e 



jvl y max max min 
^ee^v S'ee^v o<p<i 

yeY^ 0<A<l/p 



-N 



E 



,N{l-\p)fg{V0,y) 



M-l 

2-K{N-K) J2e^^-^0'iVm,y) 
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IGivP y max max min 1 2"^ V e^(^-^'')^«(^o,y) ' 
yey^ ^ o<A<i/p ^'o 

IGivl' y max max min (2-^ V e^(^-^^)^«(^0'?/) fM2-^ V e^^^^'^^'^/)] 

= |0AfP max max min < 

^ ^„ eee^v e'eOiv o<p<i t 
o<A<i/p 



(60) 



The equality in (a) is obtained by averaging over 2^^^~^^ and 2^ equiprobable values of Vq 
and G (the non-systematic part of G) , respectively. (6) is obtained by taking identical steps 
as done for ensemble of linear codes in the previous subsection (see the inequalities between 
(53) and (54)). In (c), we used the following equivalence for the two inner summations: 

M-l 

m=l Q V 

This equivalence is proved in Section A. 6 of the Appendix. In (d), we used the equality 
M = 2^. 

Finally, the upper bound to Sat achieved in (60) is identical to the one related to 
ensembles of linear codes (see (55)), and therefore the final lower bound to ^* for the 
case of systematic linear codes is also identical to (12) with uniform i.i.d. random coding 
distribution, Q* = j^, ^j- 

A.2 Proof of eq. (26) and eq. (51) 

Let 0* maximize f9{vm,y), and let F{X,p) be a nonnegative function. Then, 

min max 1 e^^^-^P'^M'"^'y^ ■ F(X, p) I = 
p>o 0eeN 



0<A<l/p 



LjV(l-Ap)/e*(Urn,y) . F{X,p) 



= mm 

|0>0 I 

0<A<l/p ^ 

< max min l e^^^-^P^f'^'"m,y) . F(X, p)\ 
~ eeON P>o \ 

0<A<l/p ^ 

< min max|e^(i-^'')^''(^-'2/)-F(A,p)l, 



(62) 



0<A<l/p 
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where (a) is true since the value of the function for a specific ^* in 0jv is always upper- 
bounded by the maximization of the function over 9 e @n- Thus, all inequalities must be 
achieved with equalities. 

A.3 Proof of eq. (22) 

For a € and y S , we exponentially evaluate E[e^°'f^^^'y^ , where the average is 
calculated over the ensemble of random coding distribution of the form: Qn{x) = ^^t^^ ■> 
(as described in (22)): 

E \Tx\y Qiv(x)e^"/«(«='?/) 



(a) 



E \'^x\y 
TxiycxN 



-e 



x\ 



E \Txly\ ' '7 e-^f^i-'y\ (63) 
TxiycxN I a; I 

where e^v— >OasiV— >oo independently of Px- 

We should note that (a) is true since fg{x,y) depends on x and y only via their joint 
empirical distribution and the summation can be conducted over types instead, and since 
the average is calculated for a given y, we sum over Tx\y- In (6) we used the convergence 
assumption for the random coding distributions withing the class Q. 
Thus, we continue to evaluate £;[e^"-^^(*'^)] as follows: 

(a) 

E^^Nafeix,y)^ ^ ^ e^p{N[-Ia:y{X;Y)-A*iPx) + afeix,y)]} 
Tx\ycx^ 

= expjTV • max[-Ixy{X;Y) - A* (Pa;) + afeix,y)]} 
Px\y 



(c) 



expliV • ma^[-Ixy{X;Y) - A*{Px) 
Px\y 

+aExy ^^Pe{Y\X) + a^E^iO)]} 
= exp{N[a^E;{e) - min {IxyiX; Y) 

+A*{Px) - aExy In P0{Y\X)}]} 

^N[a^E;(e)-uiinp^^y A{e,a,Pxy)] ^g^^ 
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where in (a), we used the facts that Tx\y = e^'^^V^^^^^ and \Tx\ = e^-"x{^\ (6) is true 
since the summation of the functional over Tx\y C is lower bounded by its maximal 
value (achieved by a specific distribution Px\y)j and upper bounded by the product of its 
maximal value with {N + 1)1-^11^1. In (c), we expressed the minimax metric in terms of the 
joint empirical distribution as described in (19). 

A. 4 Proof of Lemma 2 

In this section, we prove that when a linear code is used for a BIOS channel and the minimax 
decision rule is used (denoted by O), the error probability for the m-th message (of length 
N), = {vjno, ■ ■ ■ j'^mCAT-i)); is the Same for all m, that is, 

PErr.i^\d) = PEi^\0) for 0<m<M-l. (65) 

Considering a binary input channel, we denote the channel crossover probabilities for a 
single letter as Pg(y|i; = 0) = Pe,o(y) and P0{y\v = 1) = Pg^i{y). 
If the channel is also output symmetric then, 

P9,i{y) = Pe,o{-y), ^y^y 
The error probability for the m-th message using minimax decoding is: 

PE^im = E My\vm) 

= E n P0,0{yn) n P0,liyn) 
yeAm'' n:Vmn=0 n:Vmn=l 

= E n P^AVn) n PeA-Vn), 
yeAm" n:Vmn=0 n:Vmn=i 

(66) 

where 

Am' = |y:max|llnP,Ky|*'mO+e^;(^0}>max|^lnPe'KyM+^^;^ 
for some m' 7^ m| 

= |y : max 1 1] In Pe> (ynl^m'n) + mK{&') | > 

max <^ 111 Po" {yn\vmn) + N^E*{e") \ , for some m' ^ m\ 



. n=0 
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= iy:max{ ^ lnP(?',o(2/t) + Pe' Avt) + ln-P(?',o(2/t) + 

^ lnP,,,i(yj)+iVC£;;(e')}> 
"^^{ H lii^e",o(yt) + 51 lnP0//,o(yt) + 5Z lnP0»,i(yt) + 

''m't=0 ''m't=l ''m't=0 

^ lnPg.,i{yt)+N^E:{e")}, for some m' ^ ml 

"7774 = 1 

Y lnPe//,o(yt)+ X] lnPe//,o(yt) + 51 ^^Pe",o{-yt) + 

t- "7714=0 *: "mt=0 *: "77lt=l 

''777't=0 ''m't=l "7,7'i=0 

Y \nP0.^o{-yt)+N^E;{e")}, for some m' ^ ml. 



(67) 



t- ""mt 



Using the following transformation to dummy variables 



we get that 



j y„, Vn ■.Vmn=0 

Zn = { -Vn, Vn : Vmn = 1 



PEr.{f\e) = E n Ped^n) n ^e.o(^n) 
Zeh-m" n:Vmn=0 n:Vmn='i- 
N-1 

= Y I[Po,oi^n), (68) 



where 



Am" = |z:max{ Y ^^Pe',o{^)+ Y Po' ,oi- zt) + Y lnPe',o(--2t) + 

V t- "77xt=0 t: Vmt=0 t: Vmt = l 

^,7't=0 ^m't = l ■"m't=^ 

Y lnP0,,o{zt)+N^E;{e')}> 

E ln^e",o(^t)+ E ^^Pe",o{zt)+ Y ^^Po",o{zt) + 

t- "77lt=0 t- "7nt=0 "77lt = l 
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for some m ^m> 

= U:max{ ^ lnPe,,o(%)+ E lnPe',o(-^,) + > 
max{ E lnPe",o(%)+ E lnPe",o(^,) + iVe£;;(0}, 

for some m' ^ m|. (69) 

Now, on the one hand, (68) and (69) describe f's„i(/|^) and Am'^, respectively, for each 
< m < M — 1. On the other hand, we should note that the terms for Peq(/|6) and Aq'^ 
(describing the case where Vo = is transmitted) are obtained by assigning m = in (66) 
and (67). By doing that, the result terms coincide with (68) and (69), respectively (which, 
as mentioned before, correspond to the m-th message). This observation completes the 
proof. 

A.5 Proof of eq. (57) 

First, by the way of constructing the linear code, we know that: 



Vm' = Um'G © Vo, V < m' < M - 1 



(70) 



Since 1 < m' < M — 1 implies Um' 7^ 0, then for each information vector in this set there is 
at least one index i for which u^^'i = 1- Consequently, the construction of each code vector 
^m') 1 < m' < M — 1, can be written in the following way: 



Vm' = Um'G © Wo = fli © [E ^m'jSj 



Vo, 



where gi stands for the i-th row in G. 
Therefore: 



M-l 



M-1 



m'=l G 



(a) 



m'=iG\gi 9i 

M-l 

E E E 



.Nfe'{v,y)\ 



m'=^G\gi ^ 

M-l 

2{K-i)NY^Nf,,{v,y)x 

m'=l V 



Vo 
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(71) 



where (a) is true since for fixed values of m', G\gi (in the outer summations) and Vq, the 
row vector, which is denoted by v, is fixed, causing Qi to sum up over ah the binary vectors 
of length N. 

A.6 Proof of eq. (61) 

In this section, we prove the equality, which is given in (61), and used in (60). 
First, by the way of constructing a systematic linear code: 



K 

— Um' ^ ^ U"m'i9i 
1=1 

N-K 



K 



K 



Wm';0...0j © [0...0;^u^/igij ©vo, VO<m'<M-l, (72) 
1=1 

where gi stands for the i'th row in G (the non-systematic part of G). 

We observe that for 1 < m' < M — 1, u^i ^ 0. Thus, for each information vector in this 
set there's at least one index i for which u^'i = 1- Consequently, the construction of each 
code vector v^>, 1 < m' < M — 1, can be written in the following way: 



K 













© 



0...0;Yum'jgj 



©Wo- 



Therefore: 

M-l 



M-l 



m'=l 



G 



(a) 



'^'='G\g^ 9 

M-l 



'{[Um';9i]®v,y^x 



. . . 0; ^ Um'jgj 



©Wo 



E EE- 

G\g, ™'=° 9^ 

- E E-^'^*'^^'^^^ 

G\9. ^ 

= 2{K-i){N-K)^^Nf,,{v,y)x^ 



(73) 



V 
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where (a) is true since for fixed values of m! , G\gi (in the outer summations) and Vq, the 
row vector, which is denoted by v, is fixed. In (6), m' = was added to the summation, and 
since the inner term in the summation is always non-negative the result cannot get smaller. 

(c) is true since for a fixed t), summing up over < m' < M — 1 and cji is equivalent to the 
summation over all the possibilities for a vector of length A'^. 

A. 7 Equivalence between decision rules - Q, and A 

In this section, we prove the equivalence between the minimax decision rule, fi, maximizing 
the metric f{x,y) (as defined in (3)), and a decision rule A, minimizing p{x,y) (as defined 
in (17)). We will prove that for a given output y Ey, each Xi,X2 £ ^ satisfy: 

f{xi,y) > f{x2,y) p{xi,y) < p{x2,y). (74) 

First, we should note that f(x,y) satisfies: 

f{x,y) = max fe{x,y) 



-^{^[d{x,y) In + {N- d{x, y)) In (1 - ^) + NiE;{e)\ } 
= ^f^Jy5{x, y)\^e + {l-5{x,y))\T,{l-e) + iE; {9) } 



Q<e< 

(a) 

= max 
o<e<i 



= f{S{x,y)), 0<S{x,y)<l. (75) 
In (a), we used the following representation for the BSC transition probability: 

Peiy\x) = e'^-'^'y^ {I - ef-'^^'^^yK 

We conclude that the value of f{x, y) is equal for all code vectors with the same (normalized) 
Hamming distance from y, and therefore can be defined as f{6{x,y)), < 5{x,y) < 1. 

Next, we now prove that f{x, y) has the same value for a code vector x and its comple- 
ment, x: 

f{x,y) = f{5{x,y)) 

= f{l-6{x,y)) 

= max /^(l -(5(a;,y)) 
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0<6»< 

(a) 



max{{l-S{x,y))lne + S{x, y) In (1-6) + {9) ] 

max|(l-5(a;,y))lnfl-^) +(5(a;,y) In ^ + (1-^)1 
0<6i<l >- \ / J 

:J(1 - 6{x,y)) In (l - ^) + 5{x,y) 1^6 + iE;{e)} 



(6) 

= max 

o<e< 

= max /^('^(a;,^)) 
o<e<i 



= f{S{x,y)) 

= f{x,y). (76) 

In (a), we changed the variable in the maximization, 6 = 1 — 9, and (b) is true since for the 
BSC model the ML error exponent, E*{9), is symmetric around 9 = ^ (see (42)). 

Using the fact that both f{6{x, y)) and p{5{x, y)) are equal for 5{x, y) and 1 — 5{x, y), 
it is sufficient to prove (74) for xi and X2 satisfying S{xi,y) < ^ and S{x2,y) < \ (and 
thus p{xi,y) = S{xi,y) , p{x2,y) = 5{x2,y) )■ 

In the rest of the proof, we will denote 5{x\,y) = 5i , 6{x2,y) = 52- It is therefore 



sufficient to show that 



fiSi) > f{52) ^ < (5i < ,52 < ^. (77) 



This equivalence will be shown in two steps: 

First, we note that < 5i < ^2 < 5 satisfy that VO < 6* < i; 



*,ln(^_J>i,,„(^_J. (78) 

By adding In {I - 9) + CE;{9) to both sides of (78) we get: 

^1 (t3^) + 111 (1 - ^) + ^KiO) > S2 In (^^) + In (1 - e) + ^E;ie) (79) 

or 

ln0 + (1 - (5i) In (1 - ^) + ^£;;(e) > ^2 In^ + (1 - ^2) In (1 - ^) + ^E;{9). (80) 

This inequality is true for the values of 9, which maximize the both sides of (80). i.e.: 

max {dilne + (I - 6i)ln(l - 9) + CE:(9)} > 
o<e<i 

max {52\n9 + {1 - S2)ln{l - 9) + ^e;{9)} (81) 
o<0<h 



or 



max fg (Si) > max fe {62) . (82) 
0<6»<i o<6i<i 
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In order to complete the proof, one must broaden the maximization ranges over 9 in (82) 
into < 9 < 1. In order to justify that this broadening is possible, we present the following 
observation: 

Each < <5 < i satisfy that < 6' < 1: 



By adding In (1 - 6') + CE*{9) to both sides of (83) we get: 

<^1" (rZ-^) + In (1 - ^) + CKiO) <{l-6) In + In (1 - e) + ^e;{9) (84) 



or 



Sln9 + {l-5)ln{l-9)+ ^E;{9) < (51n (1 - 0) + (1 - 5) In^ + ^E;{9). (85) 

Using the fact that for the BSC model the ML error exponent, E*{9), is symmetric around 
9 = ^ (see (42)), we can rewrite (85) as: 

Sln9 + {l-5)ln{l-9)+ ^E;{9) < dln{l - 9) + {1 - 6)ln9 + ^E;{1 - 9) (86) 



or 



fe{5{x,y))<h_e{K^.y))- (87) 

The meaning of (87) is that when < (5 < ^, for each \ < 9 < 1, fe{5) \s always upper 
bounded by /i_5t (5) where < 1 — < |. Thus, maximization of fe {5) over < ^ < 1 is 
obviously accomplished by in 0, ^ . 
Therefore, (82) finally becomes: 



thus. 



max fo {6i) > max fg (So) (88) 
o<0<i ^ ' ~ o<e<i ^ ' ^ ' 



0<d,<62<^ ^ f{5i)>f{62), (89) 



and the proof is complete. 
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